Remarkable Similarity of Clausal Coordinate Ellipsis in Russian Compared to Dutch, Estonian, German, and Hungarian

نویسندگان

  • Karin Harbusch
  • Denis Krusko
چکیده

Elliptical constructions can help to avoid repetition of identical constituents during natural-language generation. From grammar books, it is not easy to extract executable rules for ellipsis—in our case in Russian. Therefore we follow a different strategy. We test the accuracy of a rule set that has been evaluated for the two Germanic languages, Dutch and German, and the two Finno-Ugric languages, Estonian and Hungarian. For a Russian test corpus of about 100 syntactically annotated coordinated sentences that systematically vary the conditions of rule application, our Java program can automatically produce all elliptical variants. Overand undergeneration in the resulting lists have been tested in two experiments with native speakers. Basically, the rules work very well for Russian. Within the four target languages, Russian works best with the Estonian amendments. Here we report two slight deviations partially known from the linguistic literature.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparison of Clausal Coordinate Ellipsis in Estonian and German: Remarkably Similar Elision Rules Allow a Language-Independent Ellipsis-Generation Module

We compare the phenomena of clausal coordinate ellipsis in Estonian, a Finno-Ugric language, and German, an Indo-European language. The rules underlying these phenomena appear to be remarkably similar. Thus, the software module ELLEIPO, which was originally developed to generate clausal coordinate ellipsis in German and Dutch, works for Estonian as well. In order to extend ELLEIPO’s coverage to...

متن کامل

Clausal Coordinate Ellipsis and its Varieties in Spoken German: A Study with the TüBa-D/S Treebank of the VERBMOBIL Corpus

Grammar rules for Clausal Coordinate Ellipsis (CCE) are based nearly exclusively on linguistic judgments (intuitions). For German, the extent to which grammar rules based on this type of empirical evidence generate all and only CCE structures that populate text corpora, has only been explored with the TIGER treebank of written newspaper text. How well these rules fit spoken German is unknown. I...

متن کامل

Incremental sentence production and clausal coordinate ellipsis : 


From two corpus studies into varieties of clausal coordination in English (Meyer, 1995 and Greenbaum & Nelson, 1999), it is known that the incidence of clausal coordinate ellipsis (CCE) is about two times higher in written than in spoken language. We present a treebank study into CCE in written and spoken Dutch and German which confirms this tendency. Moreover, we observe considerable differenc...

متن کامل

The Multilingual Paraphrase Database

We release a massive expansion of the paraphrase database (PPDB) that now includes a collection of paraphrases in 23 different languages. The resource is derived from large volumes of bilingual parallel data. Our collection is extracted and ranked using state of the art methods. The multilingual PPDB has over a billion paraphrase pairs in total, covering the following languages: Arabic, Bulgari...

متن کامل

Clausal Coordinate Ellipsis in German: The TIGER Treebank as a Source of Evidence

Syntactic parsers and generators need highquality grammars of coordination and coordinate ellipsis—structures that occur very frequently but are much less well understood theoretically than many other domains of grammar. Modern grammars of coordinate ellipsis are based nearly exclusively on linguistic judgments (intuitions). The extent to which grammar rules based on this type of empirical evid...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015